Overview

Dataset statistics

Number of variables9
Number of observations4177
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory497.8 KiB
Average record size in memory122.0 B

Variable types

Categorical1
Numeric8

Alerts

Length is highly correlated with Diameter and 6 other fieldsHigh correlation
Diameter is highly correlated with Length and 6 other fieldsHigh correlation
Height is highly correlated with Length and 6 other fieldsHigh correlation
Whole weight is highly correlated with Length and 6 other fieldsHigh correlation
Shucked weight is highly correlated with Length and 6 other fieldsHigh correlation
Viscera weight is highly correlated with Length and 6 other fieldsHigh correlation
Shell weight is highly correlated with Length and 6 other fieldsHigh correlation
Rings is highly correlated with Length and 6 other fieldsHigh correlation
Length is highly correlated with Diameter and 6 other fieldsHigh correlation
Diameter is highly correlated with Length and 6 other fieldsHigh correlation
Height is highly correlated with Length and 6 other fieldsHigh correlation
Whole weight is highly correlated with Length and 6 other fieldsHigh correlation
Shucked weight is highly correlated with Length and 5 other fieldsHigh correlation
Viscera weight is highly correlated with Length and 6 other fieldsHigh correlation
Shell weight is highly correlated with Length and 6 other fieldsHigh correlation
Rings is highly correlated with Length and 5 other fieldsHigh correlation
Length is highly correlated with Diameter and 5 other fieldsHigh correlation
Diameter is highly correlated with Length and 5 other fieldsHigh correlation
Height is highly correlated with Length and 6 other fieldsHigh correlation
Whole weight is highly correlated with Length and 5 other fieldsHigh correlation
Shucked weight is highly correlated with Length and 5 other fieldsHigh correlation
Viscera weight is highly correlated with Length and 5 other fieldsHigh correlation
Shell weight is highly correlated with Length and 6 other fieldsHigh correlation
Rings is highly correlated with Height and 1 other fieldsHigh correlation
Sex is highly correlated with Length and 6 other fieldsHigh correlation
Length is highly correlated with Sex and 7 other fieldsHigh correlation
Diameter is highly correlated with Sex and 7 other fieldsHigh correlation
Height is highly correlated with Length and 6 other fieldsHigh correlation
Whole weight is highly correlated with Sex and 7 other fieldsHigh correlation
Shucked weight is highly correlated with Sex and 7 other fieldsHigh correlation
Viscera weight is highly correlated with Sex and 7 other fieldsHigh correlation
Shell weight is highly correlated with Sex and 7 other fieldsHigh correlation
Rings is highly correlated with Sex and 7 other fieldsHigh correlation

Reproduction

Analysis started2022-02-14 00:03:43.700506
Analysis finished2022-02-14 00:03:50.458248
Duration6.76 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

Sex
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size236.7 KiB
M
1528 
I
1342 
F
1307 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowM
3rd rowF
4th rowM
5th rowI

Common Values

ValueCountFrequency (%)
M1528
36.6%
I1342
32.1%
F1307
31.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
m1528
36.6%
i1342
32.1%
f1307
31.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Length
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct134
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5239920996
Minimum0.075
Maximum0.815
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.8 KiB

Quantile statistics

Minimum0.075
5-th percentile0.295
Q10.45
median0.545
Q30.615
95-th percentile0.69
Maximum0.815
Range0.74
Interquartile range (IQR)0.165

Descriptive statistics

Standard deviation0.1200929126
Coefficient of variation (CV)0.2291884031
Kurtosis0.06462097389
Mean0.5239920996
Median Absolute Deviation (MAD)0.08
Skewness-0.639873269
Sum2188.715
Variance0.01442230765
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.62594
 
2.3%
0.5594
 
2.3%
0.57593
 
2.2%
0.5892
 
2.2%
0.687
 
2.1%
0.6287
 
2.1%
0.581
 
1.9%
0.5779
 
1.9%
0.6378
 
1.9%
0.6175
 
1.8%
Other values (124)3317
79.4%
ValueCountFrequency (%)
0.0751
 
< 0.1%
0.111
 
< 0.1%
0.132
 
< 0.1%
0.1351
 
< 0.1%
0.142
 
< 0.1%
0.151
 
< 0.1%
0.1553
0.1%
0.164
0.1%
0.1655
0.1%
0.173
0.1%
ValueCountFrequency (%)
0.8151
 
< 0.1%
0.81
 
< 0.1%
0.782
 
< 0.1%
0.7752
 
< 0.1%
0.773
 
0.1%
0.7652
 
< 0.1%
0.762
 
< 0.1%
0.7553
 
0.1%
0.758
0.2%
0.7455
0.1%

Diameter
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct111
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4078812545
Minimum0.055
Maximum0.65
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.8 KiB

Quantile statistics

Minimum0.055
5-th percentile0.22
Q10.35
median0.425
Q30.48
95-th percentile0.545
Maximum0.65
Range0.595
Interquartile range (IQR)0.13

Descriptive statistics

Standard deviation0.09923986613
Coefficient of variation (CV)0.2433057784
Kurtosis-0.04547558144
Mean0.4078812545
Median Absolute Deviation (MAD)0.065
Skewness-0.6091981423
Sum1703.72
Variance0.00984855103
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.45139
 
3.3%
0.475120
 
2.9%
0.4111
 
2.7%
0.5110
 
2.6%
0.47100
 
2.4%
0.4891
 
2.2%
0.45590
 
2.2%
0.4689
 
2.1%
0.4487
 
2.1%
0.48583
 
2.0%
Other values (101)3157
75.6%
ValueCountFrequency (%)
0.0551
 
< 0.1%
0.091
 
< 0.1%
0.0951
 
< 0.1%
0.12
 
< 0.1%
0.1054
0.1%
0.114
0.1%
0.1152
 
< 0.1%
0.125
0.1%
0.1257
0.2%
0.138
0.2%
ValueCountFrequency (%)
0.651
 
< 0.1%
0.633
 
0.1%
0.6251
 
< 0.1%
0.621
 
< 0.1%
0.6151
 
< 0.1%
0.611
 
< 0.1%
0.6053
 
0.1%
0.68
0.2%
0.5954
0.1%
0.596
0.1%

Height
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct51
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1395163993
Minimum0
Maximum1.13
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size32.8 KiB

Quantile statistics

Minimum0
5-th percentile0.075
Q10.115
median0.14
Q30.165
95-th percentile0.2
Maximum1.13
Range1.13
Interquartile range (IQR)0.05

Descriptive statistics

Standard deviation0.04182705661
Coefficient of variation (CV)0.2998002873
Kurtosis76.02550923
Mean0.1395163993
Median Absolute Deviation (MAD)0.025
Skewness3.128817379
Sum582.76
Variance0.001749502664
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.15267
 
6.4%
0.14220
 
5.3%
0.155217
 
5.2%
0.175211
 
5.1%
0.16205
 
4.9%
0.125202
 
4.8%
0.165193
 
4.6%
0.135189
 
4.5%
0.145182
 
4.4%
0.12169
 
4.0%
Other values (41)2122
50.8%
ValueCountFrequency (%)
02
 
< 0.1%
0.011
 
< 0.1%
0.0152
 
< 0.1%
0.022
 
< 0.1%
0.0255
 
0.1%
0.036
 
0.1%
0.0356
 
0.1%
0.0413
0.3%
0.04511
0.3%
0.0518
0.4%
ValueCountFrequency (%)
1.131
 
< 0.1%
0.5151
 
< 0.1%
0.253
 
0.1%
0.244
 
0.1%
0.2356
 
0.1%
0.2310
 
0.2%
0.22513
0.3%
0.2217
0.4%
0.21531
0.7%
0.2123
0.6%

Whole weight
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2429
Distinct (%)58.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8287421594
Minimum0.002
Maximum2.8255
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.8 KiB

Quantile statistics

Minimum0.002
5-th percentile0.1259
Q10.4415
median0.7995
Q31.153
95-th percentile1.6949
Maximum2.8255
Range2.8235
Interquartile range (IQR)0.7115

Descriptive statistics

Standard deviation0.4903890182
Coefficient of variation (CV)0.5917268871
Kurtosis-0.02364350427
Mean0.8287421594
Median Absolute Deviation (MAD)0.3565
Skewness0.5309585633
Sum3461.656
Variance0.2404813892
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.22258
 
0.2%
0.1967
 
0.2%
0.47757
 
0.2%
0.977
 
0.2%
1.13457
 
0.2%
0.186
 
0.1%
0.67656
 
0.1%
0.58056
 
0.1%
0.32456
 
0.1%
0.4946
 
0.1%
Other values (2419)4111
98.4%
ValueCountFrequency (%)
0.0021
< 0.1%
0.0081
< 0.1%
0.01051
< 0.1%
0.0131
< 0.1%
0.0141
< 0.1%
0.01452
< 0.1%
0.0151
< 0.1%
0.01551
< 0.1%
0.01751
< 0.1%
0.0182
< 0.1%
ValueCountFrequency (%)
2.82551
< 0.1%
2.77951
< 0.1%
2.6571
< 0.1%
2.5551
< 0.1%
2.551
< 0.1%
2.5481
< 0.1%
2.5261
< 0.1%
2.51551
< 0.1%
2.50851
< 0.1%
2.5051
< 0.1%

Shucked weight
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1515
Distinct (%)36.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3593674886
Minimum0.001
Maximum1.488
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.8 KiB

Quantile statistics

Minimum0.001
5-th percentile0.0524
Q10.186
median0.336
Q30.502
95-th percentile0.7402
Maximum1.488
Range1.487
Interquartile range (IQR)0.316

Descriptive statistics

Standard deviation0.221962949
Coefficient of variation (CV)0.6176489417
Kurtosis0.5951236784
Mean0.3593674886
Median Absolute Deviation (MAD)0.1585
Skewness0.7190979218
Sum1501.078
Variance0.04926755074
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.17511
 
0.3%
0.250510
 
0.2%
0.1659
 
0.2%
0.0979
 
0.2%
0.219
 
0.2%
0.4199
 
0.2%
0.3029
 
0.2%
0.0969
 
0.2%
0.20259
 
0.2%
0.29459
 
0.2%
Other values (1505)4084
97.8%
ValueCountFrequency (%)
0.0011
 
< 0.1%
0.00251
 
< 0.1%
0.00452
< 0.1%
0.0053
0.1%
0.00552
< 0.1%
0.00653
0.1%
0.0071
 
< 0.1%
0.00754
0.1%
0.0081
 
< 0.1%
0.00851
 
< 0.1%
ValueCountFrequency (%)
1.4881
< 0.1%
1.3511
< 0.1%
1.34851
< 0.1%
1.2531
< 0.1%
1.24551
< 0.1%
1.23952
< 0.1%
1.2321
< 0.1%
1.19651
< 0.1%
1.19451
< 0.1%
1.17051
< 0.1%

Viscera weight
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct880
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1805936079
Minimum0.0005
Maximum0.76
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.8 KiB

Quantile statistics

Minimum0.0005
5-th percentile0.027
Q10.0935
median0.171
Q30.253
95-th percentile0.3796
Maximum0.76
Range0.7595
Interquartile range (IQR)0.1595

Descriptive statistics

Standard deviation0.1096142503
Coefficient of variation (CV)0.6069663902
Kurtosis0.084011749
Mean0.1805936079
Median Absolute Deviation (MAD)0.0795
Skewness0.5918521514
Sum754.3395
Variance0.01201528386
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.171515
 
0.4%
0.19614
 
0.3%
0.03713
 
0.3%
0.06113
 
0.3%
0.057513
 
0.3%
0.219513
 
0.3%
0.15612
 
0.3%
0.09612
 
0.3%
0.026512
 
0.3%
0.162512
 
0.3%
Other values (870)4048
96.9%
ValueCountFrequency (%)
0.00052
 
< 0.1%
0.0021
 
< 0.1%
0.00252
 
< 0.1%
0.0033
0.1%
0.00353
0.1%
0.0041
 
< 0.1%
0.00454
0.1%
0.0057
0.2%
0.00556
0.1%
0.0062
 
< 0.1%
ValueCountFrequency (%)
0.761
< 0.1%
0.64151
< 0.1%
0.591
< 0.1%
0.5751
< 0.1%
0.57451
< 0.1%
0.5641
< 0.1%
0.551
< 0.1%
0.5412
< 0.1%
0.52651
< 0.1%
0.5261
< 0.1%

Shell weight
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct926
Distinct (%)22.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2388308595
Minimum0.0015
Maximum1.005
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.8 KiB

Quantile statistics

Minimum0.0015
5-th percentile0.0384
Q10.13
median0.234
Q30.329
95-th percentile0.48
Maximum1.005
Range1.0035
Interquartile range (IQR)0.199

Descriptive statistics

Standard deviation0.1392026695
Coefficient of variation (CV)0.5828504316
Kurtosis0.5319261262
Mean0.2388308595
Median Absolute Deviation (MAD)0.0995
Skewness0.6209268251
Sum997.5965
Variance0.0193773832
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.27543
 
1.0%
0.2542
 
1.0%
0.31540
 
1.0%
0.26540
 
1.0%
0.18540
 
1.0%
0.1737
 
0.9%
0.28537
 
0.9%
0.17536
 
0.9%
0.336
 
0.9%
0.2236
 
0.9%
Other values (916)3790
90.7%
ValueCountFrequency (%)
0.00151
 
< 0.1%
0.0031
 
< 0.1%
0.00351
 
< 0.1%
0.0042
 
< 0.1%
0.00512
0.3%
0.0061
 
< 0.1%
0.00651
 
< 0.1%
0.0071
 
< 0.1%
0.00751
 
< 0.1%
0.0084
 
0.1%
ValueCountFrequency (%)
1.0051
 
< 0.1%
0.8971
 
< 0.1%
0.8852
< 0.1%
0.851
 
< 0.1%
0.8151
 
< 0.1%
0.79751
 
< 0.1%
0.781
 
< 0.1%
0.761
 
< 0.1%
0.7261
 
< 0.1%
0.7253
0.1%

Rings
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct28
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.933684463
Minimum1
Maximum29
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size32.8 KiB

Quantile statistics

Minimum1
5-th percentile6
Q18
median9
Q311
95-th percentile16
Maximum29
Range28
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.224169032
Coefficient of variation (CV)0.324569302
Kurtosis2.330687427
Mean9.933684463
Median Absolute Deviation (MAD)2
Skewness1.114101898
Sum41493
Variance10.39526595
MonotonicityNot monotonic
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
9689
16.5%
10634
15.2%
8568
13.6%
11487
11.7%
7391
9.4%
12267
 
6.4%
6259
 
6.2%
13203
 
4.9%
14126
 
3.0%
5115
 
2.8%
Other values (18)438
10.5%
ValueCountFrequency (%)
11
 
< 0.1%
21
 
< 0.1%
315
 
0.4%
457
 
1.4%
5115
 
2.8%
6259
 
6.2%
7391
9.4%
8568
13.6%
9689
16.5%
10634
15.2%
ValueCountFrequency (%)
291
 
< 0.1%
272
 
< 0.1%
261
 
< 0.1%
251
 
< 0.1%
242
 
< 0.1%
239
 
0.2%
226
 
0.1%
2114
0.3%
2026
0.6%
1932
0.8%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

SexLengthDiameterHeightWhole weightShucked weightViscera weightShell weightRings
0M0.4550.3650.0950.51400.22450.10100.15015
1M0.3500.2650.0900.22550.09950.04850.0707
2F0.5300.4200.1350.67700.25650.14150.2109
3M0.4400.3650.1250.51600.21550.11400.15510
4I0.3300.2550.0800.20500.08950.03950.0557
5I0.4250.3000.0950.35150.14100.07750.1208
6F0.5300.4150.1500.77750.23700.14150.33020
7F0.5450.4250.1250.76800.29400.14950.26016
8M0.4750.3700.1250.50950.21650.11250.1659
9F0.5500.4400.1500.89450.31450.15100.32019

Last rows

SexLengthDiameterHeightWhole weightShucked weightViscera weightShell weightRings
4167M0.5000.3800.1250.57700.26900.12650.15359
4168F0.5150.4000.1250.61500.28650.12300.17658
4169M0.5200.3850.1650.79100.37500.18000.181510
4170M0.5500.4300.1300.83950.31550.19550.240510
4171M0.5600.4300.1550.86750.40000.17200.22908
4172F0.5650.4500.1650.88700.37000.23900.249011
4173M0.5900.4400.1350.96600.43900.21450.260510
4174M0.6000.4750.2051.17600.52550.28750.30809
4175F0.6250.4850.1501.09450.53100.26100.296010
4176M0.7100.5550.1951.94850.94550.37650.495012